Look Who's Talking: Speaker Detection using Video and Audio Correlation

نویسندگان

  • Ross Cutler
  • Larry S. Davis
چکیده

The visual motion of the mouth and the corresponding audio data generated when a person speaks are highly correlated. This fact has been exploited for lip/speechreading and for improving speech recognition. We describe a method of automatically detecting a talking person (both spatially and temporally) using video and audio data from a single microphone. The audio-visual correlation is learned using a time delayed neural network, which is then used to perform a spatio-temporal search for a speaking person. Applications include video conferencing, video indexing, and improving human computer interaction (HCI). An example HCI application is provided.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

"Look who's talking!" Gaze Patterns for Implicit and Explicit Audio-Visual Speech Synchrony Detection in Children With High-Functioning Autism.

Conversation requires integration of information from faces and voices to fully understand the speaker's message. To detect auditory-visual asynchrony of speech, listeners must integrate visual movements of the face, particularly the mouth, with auditory speech information. Individuals with autism spectrum disorder may be less successful at such multisensory integration, despite their demonstra...

متن کامل

Dependent Video Indexing Based on Audio - VisualInteractionS

A content-based video indexing method is presented in this paper that aims at temporally indexing a video sequence according to the actual speaker. This is achieved by the integration of audio and visual information. Audio analysis leads to the extraction of a speaker identity label versus time diagram. Visual analysis includes scene cut detection, face shot determination, mouth region extracti...

متن کامل

Talking head detection by likelihood-ratio test

Detecting accurately when a person whose face is visible in an audio-visual medium is the audible speaker is an enabling technology with a number of useful applications. These include fused audio/visual speaker recognition, AV (audio/visual) segmentation and diarization as well as AV synchronization. The likelihood-ratio test formulation and feature signal processing employed here allow the use...

متن کامل

Speaker Tracking Using an Audio-visual Particle Filter

We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter framework. The filter performs sampled projections of 3D location hypotheses and scores them using features from both audio and video. On the video side, the features are based on foreground segmentation, multi-view fa...

متن کامل

Audio-Visual Content Analysis for Content-Based Video Indexing

An audio-visual content analysis method is presented, which analyzes both auditory and visual information sources and accounts for their inter-relations and coincidence to extract high-level semantic information. Both shotbased and object-based access to the visual information is employed. Due to the temporal nature of video, time has to be accounted for. Thus, time-constrained video labelling ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000